wikipathways Schema Extraction

This notebook demonstrates RDF schema extraction from the wikipathways SPARQL endpoint by discovering or querying for VoID (Vocabulary of Interlinked Datasets) descriptions and some downstream uses.

Discover or get VoID Schema

No suitable existing VoID found, generating new VoID...
Starting query: class_partitions
Finished query: class_partitions (took 0.51s)
Starting query: property_partitions
Finished query: property_partitions (took 2.83s)
Starting query: datatype_partitions
Finished query: datatype_partitions (took 35.99s)
VoID description saved to /home/runner/work/rdfsolve/rdfsolve/notebooks/schema_extraction/../../docs/notebooks/wikipathways/wikipathways_generated_void.ttl
subject_class subject_uri property property_uri object_class object_uri
22 owl:Class http://www.w3.org/2002/07/owl#Class rdfs:subClassOf http://www.w3.org/2000/01/rdf-schema#subClassOf owl:Class http://www.w3.org/2002/07/owl#Class
23 owl:ObjectProperty http://www.w3.org/2002/07/owl#ObjectProperty rdfs:subClassOf http://www.w3.org/2000/01/rdf-schema#subClassOf owl:ObjectProperty http://www.w3.org/2002/07/owl#ObjectProperty
24 owl:Class http://www.w3.org/2002/07/owl#Class skos:inScheme http://www.w3.org/2004/02/skos/core#inScheme skos:ConceptScheme http://www.w3.org/2004/02/skos/core#ConceptScheme
25 owl:Class http://www.w3.org/2002/07/owl#Class skos:inScheme http://www.w3.org/2004/02/skos/core#inScheme owl:Ontology http://www.w3.org/2002/07/owl#Ontology
26 owl:DatatypeProperty http://www.w3.org/2002/07/owl#DatatypeProperty skos:inScheme http://www.w3.org/2004/02/skos/core#inScheme skos:ConceptScheme http://www.w3.org/2004/02/skos/core#ConceptScheme
... ... ... ... ... ... ...
850 PublicationReference http://vocabularies.wikipathways.org/wp#Public... foaf:page http://xmlns.com/foaf/0.1/page PublicationReference http://vocabularies.wikipathways.org/wp#Public...
856 owl:Axiom http://www.w3.org/2002/07/owl#Axiom owl:annotatedSource http://www.w3.org/2002/07/owl#annotatedSource owl:ObjectProperty http://www.w3.org/2002/07/owl#ObjectProperty
857 owl:Axiom http://www.w3.org/2002/07/owl#Axiom owl:annotatedSource http://www.w3.org/2002/07/owl#annotatedSource owl:Class http://www.w3.org/2002/07/owl#Class
866 owl:Axiom http://www.w3.org/2002/07/owl#Axiom owl:annotatedProperty http://www.w3.org/2002/07/owl#annotatedProperty owl:AnnotationProperty http://www.w3.org/2002/07/owl#AnnotationProperty
868 owl:Axiom http://www.w3.org/2002/07/owl#Axiom oboinowl:hasSynonymType http://www.geneontology.org/formats/oboInOwl#h... owl:AnnotationProperty http://www.w3.org/2002/07/owl#AnnotationProperty

833 rows × 6 columns

subject_class subject_uri property property_uri object_class object_uri
count 833 833 833 833 833 833
unique 35 37 47 47 35 37
top DataNode http://vocabularies.wikipathways.org/wp#DataNode dcterms:isPartOf http://purl.org/dc/terms/isPartOf DataNode http://vocabularies.wikipathways.org/wp#DataNode
freq 84 79 167 167 113 109

Class Partition Coverage Analysis

Query again to know how many times do we find instances of each "shape" in the dataset.

Counting instances per class...
Getting class mappings...
Calculating coverage statistics...
Coverage analysis exported to: /home/runner/work/rdfsolve/rdfsolve/notebooks/schema_extraction/../../docs/notebooks/wikipathways/wikipathways_coverage.csv
Saved to: /home/runner/work/rdfsolve/rdfsolve/notebooks/schema_extraction/../../docs/notebooks/wikipathways/wikipathways_coverage.csv

Schema Pattern Coverage Analysis

For each subject class type, calculate how many entities participate in each schema pattern divided by the total number of entities of that class type. This gives coverage ratios showing what percentage of entities actually use each relationship pattern.

subject_class property object_class coverage_percent
4 owl:DatatypeProperty skos:inScheme skos:ConceptScheme 100.0
371 DirectedInteraction source DataNode 100.0
337 TranscriptionTranslation rdf:type owl:Class 100.0
336 Stimulation rdf:type owl:Class 100.0
338 Translocation rdf:type owl:Class 100.0
67 ComplexBinding dcterms:isPartOf Pathway 100.0
65 Complex dcterms:isPartOf skos:Collection 100.0
335 Rna rdf:type owl:Class 100.0
327 DataNode rdf:type owl:Class 100.0
328 GeneProduct rdf:type owl:Class 100.0
Average pattern coverage: 25.2%
Patterns with >50% coverage: 177/833
Exported to: /home/runner/work/rdfsolve/rdfsolve/notebooks/schema_extraction/../../docs/notebooks/wikipathways/wikipathways_pattern_coverage.csv
No description has been provided for this image

LinkML

LinkML saved to /home/runner/work/rdfsolve/rdfsolve/notebooks/schema_extraction/../../docs/notebooks/wikipathways/wikipathways_linkml_schema.yaml

Mermaid diagram for LinkML Schema

Parsed LinkML schema: Classes = 39 Slots = 47

LinkML pyDantic Model Generation

Found 37 Pydantic model classes for schema.
All 37 generated Pydantic classes:

=== Anchor ===
  dcterms_isPartOf: typing.Optional[Pathway]
  rdf_type: typing.Optional[Class]

=== AnnotationProperty ===
  rdfs_subPropertyOf: typing.Optional[AnnotationProperty]

=== Axiom ===
  owl_annotatedSource: typing.Optional[ObjectProperty]
  owl_annotatedProperty: typing.Optional[AnnotationProperty]
  oboinowl_hasSynonymType: typing.Optional[AnnotationProperty]

=== Binding ===
  dcterms_isPartOf: typing.Optional[Pathway]
  participants: typing.Optional[Complex]
  rdf_type: typing.Optional[Class]
  dcterms_references: typing.Optional[PublicationReference]
  source: typing.Optional[DataNode]
  target: typing.Optional[DataNode]
  dcterms_bibliographicCitation: typing.Optional[PublicationXref]

=== Catalysis ===
  dcterms_isPartOf: typing.Optional[Pathway]
  participants: typing.Optional[Complex]
  rdf_type: typing.Optional[Class]
  dcterms_references: typing.Optional[PublicationReference]
  source: typing.Optional[DataNode]
  target: typing.Optional[DataNode]

=== Class ===
  rdfs_subClassOf: typing.Optional[Class]
  skos_inScheme: typing.Optional[ConceptScheme]
  oboinowl_inSubset: typing.Optional[AnnotationProperty]
  iao_0100001: typing.Optional[Class]
  owl_disjointWith: typing.Optional[Class]

=== Collection ===
  rdf_type: typing.Optional[Class]
  dcterms_references: typing.Optional[PublicationReference]
  cito_cites: typing.Optional[PublicationReference]
  dc_identifier: typing.Optional[DataNode]
  diseaseOntologyTag: typing.Optional[Class]
  foaf_img: typing.Optional[Image]
  ontologyTag: typing.Optional[Class]
  pathwayOntologyTag: typing.Optional[Class]

=== Comment ===
  dcterms_isPartOf: typing.Optional[Pathway]
  rdf_type: typing.Optional[Class]

=== Complex ===
  dcterms_isPartOf: typing.Optional[Pathway]
  participants: typing.Optional[Complex]
  rdf_type: typing.Optional[Class]
  dcterms_references: typing.Optional[PublicationReference]
  bdbReactome: typing.Optional[DataNode]
  dc_identifier: typing.Optional[DataNode]
  dcterms_bibliographicCitation: typing.Optional[PublicationXref]
  bdbComplexPortal: typing.Optional[DataNode]
  bdbEntrezGene: typing.Optional[Complex]
  bdbEnsembl: typing.Optional[Rna]
  bdbHgncSymbol: typing.Optional[DataNode]
  bdbUniprot: typing.Optional[Complex]

=== ComplexBinding ===
  dcterms_isPartOf: typing.Optional[Pathway]
  participants: typing.Optional[Complex]
  rdf_type: typing.Optional[Class]
  dcterms_bibliographicCitation: typing.Optional[PublicationXref]

=== Conversion ===
  dcterms_isPartOf: typing.Optional[Pathway]
  participants: typing.Optional[Complex]
  rdf_type: typing.Optional[Class]
  dcterms_references: typing.Optional[PublicationReference]
  source: typing.Optional[DataNode]
  target: typing.Optional[DataNode]

=== DataNode ===
  dcterms_isPartOf: typing.Optional[Pathway]
  participants: typing.Optional[Complex]
  rdf_type: typing.Optional[Class]
  dcterms_references: typing.Optional[PublicationReference]
  bdbChEBI: typing.Optional[Metabolite]
  bdbHmdb: typing.Optional[DataNode]
  bdbReactome: typing.Optional[DataNode]
  dc_identifier: typing.Optional[DataNode]
  hasPublicationXref: typing.Optional[PublicationReference]
  hasComment: typing.Optional[Comment]
  dcterms_bibliographicCitation: typing.Optional[PublicationXref]
  bdbComplexPortal: typing.Optional[DataNode]
  bdbEntrezGene: typing.Optional[Complex]
  bdbEnsembl: typing.Optional[Rna]
  bdbHgncSymbol: typing.Optional[DataNode]
  bdbUniprot: typing.Optional[Complex]
  pav_hasVersion: typing.Optional[Pathway]
  bdbLipidMaps: typing.Optional[DataNode]
  dc_creator: typing.Optional[Person]
  bdbKeggCompound: typing.Optional[DataNode]
  bdbChemspider: typing.Optional[Metabolite]
  rdfs_seeAlso: typing.Optional[DataNode]
  bdbInChIKey: typing.Optional[DataNode]

=== DatatypeProperty ===
  skos_inScheme: typing.Optional[ConceptScheme]
  rdfs_range: typing.Optional[Class]

=== DirectedInteraction ===
  dcterms_isPartOf: typing.Optional[Pathway]
  participants: typing.Optional[Complex]
  rdf_type: typing.Optional[Class]
  dcterms_references: typing.Optional[PublicationReference]
  source: typing.Optional[DataNode]
  target: typing.Optional[DataNode]
  bdbChEBI: typing.Optional[Metabolite]
  bdbHmdb: typing.Optional[DataNode]
  bdbReactome: typing.Optional[DataNode]

=== GeneProduct ===
  dcterms_isPartOf: typing.Optional[Pathway]
  rdf_type: typing.Optional[Class]
  dcterms_references: typing.Optional[PublicationReference]
  bdbChEBI: typing.Optional[Metabolite]
  bdbReactome: typing.Optional[DataNode]
  dc_identifier: typing.Optional[DataNode]
  bdbComplexPortal: typing.Optional[DataNode]
  bdbEntrezGene: typing.Optional[Complex]
  bdbEnsembl: typing.Optional[Rna]
  bdbHgncSymbol: typing.Optional[DataNode]
  bdbUniprot: typing.Optional[Complex]
  pav_hasVersion: typing.Optional[Pathway]
  dc_creator: typing.Optional[Person]
  bdbKeggCompound: typing.Optional[DataNode]

=== GraphicalLine ===
  dcterms_isPartOf: typing.Optional[Pathway]
  rdf_type: typing.Optional[Class]
  hasPublicationXref: typing.Optional[PublicationReference]
  hasComment: typing.Optional[Comment]
  hasPoint: typing.Optional[Point]
  hasAnchor: typing.Optional[Anchor]

=== Group ===
  rdf_type: typing.Optional[Class]
  hasPublicationXref: typing.Optional[PublicationReference]
  hasComment: typing.Optional[Comment]

=== InfoBox ===
  rdf_type: typing.Optional[Class]

=== Inhibition ===
  dcterms_isPartOf: typing.Optional[Pathway]
  participants: typing.Optional[Complex]
  rdf_type: typing.Optional[Class]
  dcterms_references: typing.Optional[PublicationReference]
  source: typing.Optional[DataNode]
  target: typing.Optional[DataNode]
  bdbChEBI: typing.Optional[Metabolite]
  bdbHmdb: typing.Optional[DataNode]

=== Interaction ===
  dcterms_isPartOf: typing.Optional[Pathway]
  participants: typing.Optional[Complex]
  rdf_type: typing.Optional[Class]
  dcterms_references: typing.Optional[PublicationReference]
  source: typing.Optional[DataNode]
  target: typing.Optional[DataNode]
  bdbChEBI: typing.Optional[Metabolite]
  bdbHmdb: typing.Optional[DataNode]
  bdbReactome: typing.Optional[DataNode]
  hasPublicationXref: typing.Optional[PublicationReference]
  hasComment: typing.Optional[Comment]
  hasPoint: typing.Optional[Point]
  hasAnchor: typing.Optional[Anchor]
  dcterms_bibliographicCitation: typing.Optional[PublicationXref]

=== Label ===
  dcterms_isPartOf: typing.Optional[Pathway]
  rdf_type: typing.Optional[Class]
  hasPublicationXref: typing.Optional[PublicationReference]
  hasComment: typing.Optional[Comment]

=== LinkMLMeta ===
  root: dict[str, typing.Any]

=== Metabolite ===
  dcterms_isPartOf: typing.Optional[Pathway]
  rdf_type: typing.Optional[Class]
  dcterms_references: typing.Optional[PublicationReference]
  bdbChEBI: typing.Optional[Metabolite]
  bdbHmdb: typing.Optional[DataNode]
  bdbReactome: typing.Optional[DataNode]
  dc_identifier: typing.Optional[DataNode]
  bdbEntrezGene: typing.Optional[Complex]
  bdbUniprot: typing.Optional[Complex]
  bdbLipidMaps: typing.Optional[DataNode]
  bdbKeggCompound: typing.Optional[DataNode]
  bdbChemspider: typing.Optional[Metabolite]
  rdfs_seeAlso: typing.Optional[DataNode]
  bdbInChIKey: typing.Optional[DataNode]

=== ObjectProperty ===
  rdfs_subClassOf: typing.Optional[Class]
  skos_inScheme: typing.Optional[ConceptScheme]
  rdfs_range: typing.Optional[Class]
  rdfs_domain: typing.Optional[Class]

=== Pathway ===
  dcterms_isPartOf: typing.Optional[Pathway]
  rdf_type: typing.Optional[Class]
  dcterms_references: typing.Optional[PublicationReference]
  cito_cites: typing.Optional[PublicationReference]
  dc_identifier: typing.Optional[DataNode]
  diseaseOntologyTag: typing.Optional[Class]
  foaf_img: typing.Optional[Image]
  ontologyTag: typing.Optional[Class]
  pathwayOntologyTag: typing.Optional[Class]
  bdbEntrezGene: typing.Optional[Complex]
  bdbEnsembl: typing.Optional[Rna]
  bdbHgncSymbol: typing.Optional[DataNode]
  bdbUniprot: typing.Optional[Complex]
  pav_hasVersion: typing.Optional[Pathway]
  dc_creator: typing.Optional[Person]

=== Point ===
  dcterms_isPartOf: typing.Optional[Pathway]
  rdf_type: typing.Optional[Class]

=== Protein ===
  dcterms_isPartOf: typing.Optional[Pathway]
  rdf_type: typing.Optional[Class]
  dcterms_references: typing.Optional[PublicationReference]
  bdbChEBI: typing.Optional[Metabolite]
  bdbReactome: typing.Optional[DataNode]
  dc_identifier: typing.Optional[DataNode]
  bdbEntrezGene: typing.Optional[Complex]
  bdbEnsembl: typing.Optional[Rna]
  bdbHgncSymbol: typing.Optional[DataNode]
  bdbUniprot: typing.Optional[Complex]

=== PublicationReference ===
  dcterms_isPartOf: typing.Optional[Pathway]
  rdf_type: typing.Optional[Class]
  cito_cites: typing.Optional[PublicationReference]
  foaf_page: typing.Optional[PublicationReference]

=== PublicationXref ===
  dcterms_isPartOf: typing.Optional[Pathway]
  rdf_type: typing.Optional[Class]

=== Restriction ===
  owl_someValuesFrom: typing.Optional[Class]
  owl_onProperty: typing.Optional[ObjectProperty]

=== Rna ===
  dcterms_isPartOf: typing.Optional[Pathway]
  rdf_type: typing.Optional[Class]
  dcterms_references: typing.Optional[PublicationReference]
  bdbChEBI: typing.Optional[Metabolite]
  dc_identifier: typing.Optional[DataNode]
  bdbEntrezGene: typing.Optional[Complex]
  bdbEnsembl: typing.Optional[Rna]
  bdbHgncSymbol: typing.Optional[DataNode]
  bdbUniprot: typing.Optional[Complex]
  bdbKeggCompound: typing.Optional[DataNode]

=== RootModel ===
  root: ~RootModelRootType

=== Shape ===
  dcterms_isPartOf: typing.Optional[Pathway]
  rdf_type: typing.Optional[Class]
  hasPublicationXref: typing.Optional[PublicationReference]
  hasComment: typing.Optional[Comment]

=== State ===
  rdf_type: typing.Optional[Class]
  hasPublicationXref: typing.Optional[PublicationReference]
  hasComment: typing.Optional[Comment]
  stateOf: typing.Optional[DataNode]

=== Stimulation ===
  dcterms_isPartOf: typing.Optional[Pathway]
  participants: typing.Optional[Complex]
  rdf_type: typing.Optional[Class]
  dcterms_references: typing.Optional[PublicationReference]
  source: typing.Optional[DataNode]
  target: typing.Optional[DataNode]

=== TranscriptionTranslation ===
  dcterms_isPartOf: typing.Optional[Pathway]
  participants: typing.Optional[Complex]
  rdf_type: typing.Optional[Class]
  dcterms_references: typing.Optional[PublicationReference]
  source: typing.Optional[DataNode]
  target: typing.Optional[DataNode]

=== Translocation ===
  dcterms_isPartOf: typing.Optional[Pathway]
  participants: typing.Optional[Complex]
  rdf_type: typing.Optional[Class]
  source: typing.Optional[DataNode]
  target: typing.Optional[DataNode]

Export Formats